SIMD Vectorization of Straight Line FFT Code

نویسندگان

Stefan Kral

Franz Franchetti

Juergen Lorenz

Christoph W. Ueberhuber

چکیده

This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism. FFT kernels are accelerated by automatically vectorizing blocks of straight line code for processors featuring two-way short vector SIMD extensions like AMD’s 3DNow! and Intel’s SSE 2. Additionally, a special compiler backend is introduced which is able to (i) utilize particular code properties, (ii) generate optimized address computation, and (iii) apply specialized register allocation and instruction scheduling. Experiments show that automatic SIMD vectorization can achieve performance that is comparable to the optimal hand-generated code for FFT kernels. The newly developed methods have been integrated into the codelet generator of Fftw and successfully vectorized complicated code like real-to-halfcomplex non-power-of-two FFT kernels. The floatingpoint performance of Fftw’s scalar version has been more than doubled, resulting in the fastest FFT implementation to date.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FFT Compiler Techniques

This paper presents compiler technology that targets general purpose microprocessors augmented with SIMD execution units for exploiting data level parallelism. Numerical applications are accelerated by automatically vectorizing blocks of straight line code to be run on processors featuring two-way short vector SIMD extensions like Intel’s SSE 2 on Pentium 4, SSE 3 on Intel Prescott, AMD’s 3DNow...

متن کامل

Short Vector SIMD Code Generation for DSP Algorithms

Short vector SIMD instructions on recent general purpose microprocessors, such as SSE on Pentium III and 4, offer a high potential speed-up but require a very high level of programming expertise. We present a compiler that generates vectorized code for digital signal processing algorithms such as the fast Fourier transform (FFT). The input to our compiler is a mathematical description of the al...

متن کامل

Vectorization Techniques for BlueGene/L’s Double FPU

This paper presents vectorization techniques tailored to meet the specifics of the twoway single-instruction multiple-data (SIMD) double-precision floating-point unit, which is a core element of the node ASICs of IBM's 360 Tflop/s supercomputer BlueGene/L. The paper focuses on the general-purpose basic-block vectorization methods provided by the Vienna MAP vectorizer. In addition, the paper int...

متن کامل

Architecture independent short vector FFTs

This paper introduces a SIMD vectorization for FFTW—the “fastest Fourier transform in the west” by Matteo Frigo and Steven Johnson. The new method leads to an architecture independent short vector SIMD FFT vectorization that utilizes the architecture adaptivity of FFTW. It is based on special FFT kernels (up to size 64 and more) that are utilized by FFTW to compute the whole transform. This vec...

متن کامل

An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors

In the present paper, an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Streaming SIMD Extensions 3 (SSE3) instructions on dual-core processors is proposed. Combination of vectorization and the block six-step FFT algorithm is shown to effectively improve performance. The performance results for one-dimensional FFTs on dual-core Intel Xeon processors are reported...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

SIMD Vectorization of Straight Line FFT Code

نویسندگان

چکیده

منابع مشابه

FFT Compiler Techniques

Short Vector SIMD Code Generation for DSP Algorithms

Vectorization Techniques for BlueGene/L’s Double FPU

Architecture independent short vector FFTs

An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors

عنوان ژورنال:

اشتراک گذاری